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AN EFFICIENT FORWARD-REVERSE 
EXPECTATION-MAXIMIZATION ALGORITHM FOR STATISTICAL 
INFERENCE IN STOCHASTIC REACTION NETWORKS 

CHRISTIAN BAYER*, ALVARO MORAESt, RAUL TEMPONE*, AND PEDRO VILANOVA§ 


Abstract. In this work, we present an extension to the context of Stochastic Reaction Networks 
(SRNs) of the forward-reverse representation introduced in “Simulation of forward-reverse stochastic 
representations for conditional diffusions”, a 2014 paper by Bayer and Schoenmakers. We apply this 
stochastic representation in the computation of efficient approximations of expected values of func¬ 
tionals of SNR bridges, z.e., SRNs conditioned to its values in the extremes of given time-intervals. 
We then employ this SNR bridge-generation technique to the statistical inference problem of ap¬ 
proximating the reaction propensities based on discretely observed data. To this end, we introduce 
a two-phase iterative inference method in which, during phase I, we solve a set of deterministic 
optimization problems where the SRNs are replaced by their reaction-rate Ordinary Differential 
Equations (ODEs) approximation; then, during phase II, we apply the Monte Carlo version of the 
Expectation-Maximization (EM) algorithm starting from the phase I output. By selecting a set of 
over dispersed seeds as initial points for phase I, the output of parallel runs from our two-phase 
method is a cluster of approximate maximum likelihood estimates. Our results are illustrated by 
numerical examples. 

Key words. Forward-reverse algorithm, Monte Carlo EM algorithm, inference for stochastic 
reaction networks, bridges for continuous-time Markov chains. 
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1. Introduction. Stochastic Reaction Networks (SRNs) are a class of continuous¬ 
time Markov chains, A={A(t)}tg[o^T]) that take values in i.e., the lattice of 
d-tuples of non-negative integers. SRNs are mathematical models employed to de¬ 
scribe the time evolution of many natural and artificial systems. Among them we find 
biochemical reactions, spread of epidemic diseases, communication networks, social 
networks, transcription and translation in genomics, and virus kinetics. 

For historical reasons, the jargon from chemical kinetics is used to describe the 
elements of SRNs. The integer d>l is the number of chemical species reacting in our 
system. The coordinates of the Markov chain, X{t)={Xi{t),... ,Xd{t)), account for 
the number of molecules or individuals of each species present in the system at time 
t. The transitions in our system are given by a finite number J of reaction channels, 
{T^j)j=i- Each reaction channel TZj is a pair formed by a vector i/j of d integer 
components and a non-negative function aj(x) of the state of the system. Usually, lyj 
and Oj are named stoichiometric vector and propensity function, respectively. Because 
our state space is a lattice, our system evolves in time by jumping from one state to 
the next, and for that reason A is a pure jump process. 

The propensity functions, aj, are usually derived through the mass action prin¬ 
ciple also known as the law of mass action, see for instance Section 3.2.1 in [16]. For 
that reason, we assume that aj{x)=Cj gj{x), where Cj is a non negative coefficient and 
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gj{x) is a given monomial in the coordinates of the process, X. However, our results 
can be easily extended to polynomial propensities. 

In this work, we address the statistical inference problem of estimating the coef¬ 
ficients 0={ci,... ,cj) from discretely observed data, i.e., data collected by observing 
one or more paths of the process X at a certain finite number of observational times 
or epochs. It means that our data, V, is a hnite collection a^(in,m))}i where 

TO=1, 2,..., M indicates the observed path, n=l, 2,..., N{m) indicates the n-th ob¬ 
servational time corresponding to the m-th path, and the datum x{tn,m) can be 
considered as an observation of the m-path of the process X at time time tn,m- The 
observational times, are either deterministic or random but independent from 

the state of the process X. In what follows, we denote with Xi^n,m the *-th coordinate 
of X{tn^rn,^m), with X. ^^m the vector X{tn^rm^m), where Wm. is the TO-th path of 
the process X. 

Let us remark that we observe all the coordinates of X and not only a fixed subset 
at each observational time tn.m- In that sense, we are not treating the case of partially 
observed data where only a fixed proper subset of coordinates of X is observed. 

Remark 1.1. The partially observed case can in principle also be treated by a 
variant of the FREM algorithm based on [3] (Corollary 3.8). 

For further convenience, we organize the information in our data set, T), as a finite 
collection, 

(IT) V = ([sfc, 4], a;(sfe), a:(4))f=i, 

such that for each k, Ij- := [sfe,4] is the time interval determined by two consecutive 
observational points Sk and tfe, where the states x{sk) and x{tk) have been observed. 
Notice that the set T) collects all the data corresponding to the M observed paths of 
the process X. For that reason, it is possible to have [sfc,tfc]=[sfc',4'] for k^k', for 
instance, in the case of repeated measurements. 

For technical reasons, we need to define a sequence of intermediate times, {t).)^^^, 
for instance, could be the midpoint of [sk,tk]. 

It turns out that the likelihood function, lik°(d), corresponding to data obtained 
from continuously observed paths of X is relatively easy to derive (see Section 3.2). 
It depends on the total number of times that each reaction channel fires over the time 
interval [0,T] and the values of the monomials gj evaluated at the jump times of X. 
Since the observational times, tn.m, are not necessarily equal to the jump times of the 
process X, we can not directly deal with the likelihood lik'^(0). For that reason, we 
consider the Monte Carlo version of the expectation-maximization (EM) algorithm 
[7, 28, 33, 20] in which we treat the jump times of X and their corresponding reactions 
as missing data. The “missing data” can be gathered by simulating SRN bridges of 
the process X conditional on T), i.e., X{sk)=x{sk) and X{tk)=x{tk) for all intervals 
[sk,tk]. To simulate SRN bridges, we extend the forward-reverse technique developed 
by Bayer and Schoenmakers [3] for Ito diffusions to the case of SRNs. As explained 
in Section 2, the forward-reverse algorithm generates forward paths from Sk to and 
backward paths from tk to An exact SRN bridge is formed when forward and 
backward paths meet at Observe that the probability of producing SRN bridges 
strongly depends on the approximation of 6 that we use to generate the forward and 
backward paths. In addition to exact bridges, in this work we also relax this meeting 
condition by using a kernel k. 

Here, we present a two-phase algorithm that approximates the Maximum Likeli¬ 
hood Estimator, 0mle, of the vector 9 using the collected data, V. 


3 


Phase I is the result of a deterministic procedure while phase II is the result of 
a stochastic one. The purpose of phase I is to generate an estimate of 9 that will be 
used as initial point for phase II. To this end, in the phase I we solve a deterministic 
global optimization problem obtained by substituting at each time interval, 
the ODE approximations to the mean of the forward and reverse stochastic paths 
and minimizing a weighted sum of the squares of the Euclidean distances of the ODE 
approximations at the times Using this value as a starting point for phase II, we 
hope to simulate an acceptable number of SRN bridges in the interval [sk,tk] without 
too much computational effort. Phase I starts at and provides 9^^\ In phase II 
we run a Monte Carlo EM stochastic sequence until a certain convergence 

criterion is fulfilled. Here we have a schematic representation of the two-phase method: 

0(0) ^ 0(0) ^ 0(1) ^ ... 0(P) ^- 

During phase II, we intensively use a computationally efficient implementation 
of the SRN-bridge simulation algorithm for simulating the “missing data” that feeds 
the Monte Carlo EM algorithm. Details are provided in Section 4. Our two-phase 
algorithm is named FREM as the acronym for Forward-Reverse Expectation Maxi¬ 
mization. 

Although our FREM algorithm has certain similarity with the estimation method¬ 
ology proposed in [6], there are also notable differences. In terms of the similarity, 
in [6] the authors propose a two-phase method where the first phase is intended to 
select a seed for the second phase, which is an implementation of the Monte Carlo 
EM algorithm. While our first phase is deterministic and uses the reaction-rate ODEs 
as approximations of the SRN paths, theirs is stochastic and a number of parameters 
should be chosen to determine the amount of computational work and the accuracy 
of the estimates. There is also a main difference is the implementation of the second 
phase: while the FREM algorithm is focused in efficiently generating kernel-based 
SRN bridges using the novel forward-reverse technology introduced by Bayer and 
Schoenmakers in [3] , the authors of [6] propose a trial-and-error shooting method for 
sampling SRN bridges. This shooting method can be viewed as a particular case of 
the FREM algorithm by systematically choosing the intermediate point as the right 
extreme point tk, giving no place for backward paths. To quantify the uncertainty 
in our estimates, we prefer to have the outputs of our algorithm starting from a set 
of over dispersed initial points without assuming Gaussianity in its distribution (see 
[28]). The variance of our estimators can be easily assessed by bootstrap calculations. 
In our numerical experiments, we observe that the outputs lie on a low-dimensional 
manifold in parameter space; this is a motivation against the use of the Gaussiantiy 
assumption. Regarding the stopping criterion proposed in [6] , we found that the con¬ 
dition imposed there, of obtaining three consecutive iterations close to each other up 
to a certain tolerance, could be considered as a rare event in some examples and it 
may lead to the generation of an excessive number of Monte Carlo EM iterations. We 
refer to [6] for comparisons against other existing related statistical inference methods 
for SRNs. 

In [32] the authors propose a method based on maximum likelihood for param¬ 
eter inference. It is based on first estimating the gradient of the likelihood function 
with respect to the parameters by using reversible-jump Markov chain Monte Carlo 
sampling (RJMCMC) [15, 4] and then applying a gradient descent method to obtain 
the maximum likelihood estimation of the parameter values. The authors provide a 
formula for the gradient of the likelihood function given the observations. The idea of 
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the RJMCMC method is to generate an initial reaction path and then generate new 
samples by adding or deleting a set of reactions from the path using an acceptance 
method. The authors propose a general method for obtaining a sampler that can 
work for any reaction system. This sampler can be inefficient in the case of large 
observation intervals. At this point, we would like to observe that their approach can 
be combined with ours if, instead of using the RJMCMC method for computing the 
gradient of the likelihood function, we use our forward-reverse method. We think that 
this combination may be useful in cases in which many iterations of our method are 
needed (see Section 6.3 for such an example). This is left as future work. 

In the remainder of this section, we formally introduce SRNs and their reaction- 
rate ODE approximations, the stochastic simulation algorithm and the forward-reverse 
method. In Section 2, we develop the main result of this article: the extension of the 
forward-reverse technique to the context of SRNs. The EM algorithm for SRNs is 
introduced in Section 3. Next, in Section 4, we introduce the main application of 
this article: the forward-reverse EM (FREM) algorithm for SRNs. In Section 5, we 
provide computational details for the practical implementation of the FREM algo¬ 
rithm. Later, in Section 6, we present numerical examples to illustrate the FREM 
algorithm and finally, we present our conclusions in Section 7. Appendix A contains 
the pseudo-code for the implementation of the FREM algorithm. 

1.1. Stochastic Reaction Networks. Stochastic Reaction Networks are con¬ 
tinuous time Markov chains, A : [0,T] x 17 —>■ Z;(., that describe the stochastic evo¬ 
lution of a system of d interacting species. In this context, the i-th coordinate of the 
process X, Xi{t), can be interpreted as the number of individuals of species i present 
in the system at time t. 

The system evolves randomly through J different reaction channels TZj := (yj,aj). 
Each stoichiometric vector represents a possible jump of the system, x —>■ x-\-Vj. 

The probability that the reaction j occurs during an infinitesimal interval (t, t + dt) 
is given by 

(1.2) P (reaction j fires during {t,t + dt) | X{t) = x) = aj{x)dt o (dt), 

where Oj : —>■ [0,oo) are known as propensity functions. We set aj{x)=d for those 

x such that x+Vj ^ We assume that the initial condition of A, A(0) = xq G Z^f. 
is deterministic and known. The stoichiometric matrix v is defined as the matrix 
whose j-column is Vj denotes its transpose). The propensity vector a{x) € 
has aj(x) as components. 

Example 1.2 (Simple decay model). Consider the reaction A 0 where one 
particle is consumed. In this case, the state vector X{t) is in Z_|_ where X denotes 
the number of particles in the system. The vector for this reaction is v = —1. The 
propensity functions in this case could be, for example, a(X) = cX, where c > 0. 
Section 6 contains more examples of stochastic reaction networks. 

1.2. Deterministic Approximations of SRNs. The infinitesimal generator 
Cx of the process A is a linear operator defined on the set of bounded functions [8] . 
In the case of SRN, it is given by 

^xif){x) := aj{x)ifix + Vj) - fix)). 
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(1.3) 
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The Dynkin formula, (see [19]) 

(1-4) E [f{xm = /(X(0)) + /* E [Cx{f){s)] ds, 

Jo 

can be used to obtain integral equations describing the time evolution of any observ¬ 
able of the process X. In particular, taking the canonical projections fi{x) = Xi, we 
obtain a system of equations for E [Xi(t)], 

E[Xi(t)] = a;o-b [ E [aj-(X(s))] tyj^ids. 

Jo ^ 

If all the propensity functions, Uj, are affine functions of the state, then this system 
of equations leads to a closed system of ODEs. In general, some propensity functions 
may not depend on their coordinates x in an affine way, and for that reason, the 
integral equations for E[Xi(t)] obtained from the Dynkin formula depend on higher 
moments of X. This can be treated using moment closure techniques [12, 30] or by 
taking a different approach: using a formal first-order Taylor expansion of / in (1.3), 
we obtain the generator 

/Jz{f){x) :=Y^aj{x)d^f{x)vj, 
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which corresponds to the reaction-rate ODEs (also known as the mean field ODEs) 


f dZ{t) = i'a{Z{t))dt, t G K+, 
\ Z{0) = xq, 


where the j-column of the matrix v is Vj and a is a column vector with components 

aj. 

This derivation motivates the use of Z(t) as an approximation of E [^(t)] in phase 
I of our FREM algorithm. 

1.3. The Stochastic Simulation Algorithm. To simulate paths of the process 
X, we employ the stochastic simulation algorithm (SSA) by Gillespie [13]. The SSA 
simulates statistically exact paths of X, i.e., the probability law of any path generated 
by the SSA satisfies (1.2). It requires one to sample two independent uniform random 
variables per time step: one is used to find the time of the next reaction and the 
other to determine which is the reaction that fires at that time. Concretely, given the 
current state of the system, x := X(t), we simulate two independent uniform random 
numbers, Ui, U 2 ~ 1) and compute: 


k 

j = min G {1,..., J} : ^ai(x)>C/i ao(x)|, Tmin = - (ao(x))~^ In (C/ 2 ), 

i=l 

where ao(x) := o,j{.x). The system remains in the state x until the time /-l-Tmi„ 

when it jumps, X{t Tmin) = x Uj. In this way, we can simulate a full path of the 
process X. 

Exact paths can be generated using more efficient algorithms like the modified 
next reaction method by Anderson [1], where only one uniform variate is needed at 
each step. However, in regimes where the total propensity, aQ{x), is high, approxi¬ 
mate path-simulation methods like the hybrid Chernoff tau-leap [23] or its multilevel 
versions [25, 24] may be required. 
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1.4. Bridge Simulation for SDEs. In [3], Bayer and Schoenmakers intro¬ 
duced the so-called forward-reverse algorithm for computing conditional expectations 
of path-dependent functionals of a diffusion process conditioned on the values of the 
diffusion process at the end-points of the time interval. More precisely, let X = X(t), 
0 < t < T, denote the solution of a d-dimensional stochastic differential equation 
(SDE) driven by standard Brownian motion. Under mild regularity conditions, a 
stochastic representation is provided for conditional expectations of the form. 


H = E [g{X) I Xo = X, Xt = j/] , 

for fixed values x,y G and a (sufficiently regular) functional g on the path-space.^ 
More precisely, they prove an limiting equality of the form 

lim^^oE [g(X(/) oXW)s:,(X(/)(U) -XW(U))y] 

\im,^oE[K,{X(f){t*)-X(b)(t*))y] 

Here, is the solution of the original SDE (i.e., is a copy of X) started at X^^'> (0) = 
X and solved until some time 0 < t* <T. X^^'> is the time-reversal of another diffusion 
process Y whose dynamics are again given by an SDE (with coefficients explicitly given 
in terms of the coefficients of the original SDEs) started at Y{t*) = y and run until 
time T. Hence, starts at t* and ends at X^^\T) = y. We then evaluate the 
functional g on the “concatenation” X^f'^ o X^^'> of the paths X^^^ and which is 
a path defined on the full interval [0, T] defined by 


X(/) oX('')(s) 


X(/)(s), 0<s<U, 

X(^)(s), t*<s<T. 


In particular, we remark that X^^'^ o may exhibit a jump at t* . Here, y is an 
exponential weighting term of the form y = exp ^ c{Ys)ds^. At last, denotes a 

kernel with bandwidth e > 0. Notice that the processes and the pair (X^^^J^) 
are chosen to be independent. 

Let us roughly explain the structure of the representation (1.6). First note that 
the term on the right-hand side only contains standard (unconditional) expectations, 
implying that the right-hand side (unlike the left-hand side) is amenable to standard 
Monte Carlo simulation which is why we call (1.6) a “stochastic representation”. The 
denominator of (1.6) actually equals the transition density p(0, x, T, y) of the solution 
X, and its presence directly follows from the same term in the (analytical) definition 
of the conditional expectation in terms of densities. In fact, it was precisely in this 
context (i.e., in the context of density estimation) that Milstein, Schoenmakers and 
Spokoiny introduced the general idea for the first time [21]. In essence, the reverse 
process Y can be thought as an “adjoint” process to X, as its infinitesimal generator 
is essentially the adjoint operator of the infinitesimal generator of X (see below for a 
more detailed discussion in the SRN setting). 

In a nutshell, the idea is that the law of the diffusion bridge admits a Radon- 
Nikodym density with respect to the law of the concatenated process X*^l^ oX^^'l with 
density given by y, provided that the trajectories meet at time t*, i.e., provided that 

^In fact, Bayer and Schoenmakers [3] require g to be a smooth function of the values Xt^ of 
the process X along a grid ti, but a closer look at the paper reveals that more general, truly path- 
dependent functionals can be allowed. 
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Of course, this happens only with zero probability^, so we relax 
the above equality with the help of a kernel with a positive bandwidth e. Further¬ 
more, note that by the independence of X^^'^ and X^^\ we can independently sample 
many trajectories of X^f'> and many trajectories of X^^'> and then identify all pairs of 
trajectories satisfying the approximate identity X^^'>(t*) « as determined by 

the kernel k^. This results in a Monte Carlo algorithm, which, in principle, requires 
the calculation of a huge double sum by summing over all pairs of N samples from 
X^^'^ and M samples from X^^\ A naive implementation of that algorithm would 
require a prohibitive computational cost of order 0{M^) operations, but fortunately 
there are more efficient implementation relying on the structure of the kernel and often 
reducing the complexity to 0{M\og{M)) (see [3, 2]). In this way, the forward-reverse 
algorithm can nearly achieve the optimal Monte Carlo convergence rate of 1/2. More 
precisely, assuming enough regularity on the density of X and assuming the use of a 
kernel of sufficiently high order (depending on the dimension), the root-mean-squared 
error of the estimator is with a complexity 0(Mlog(M)) and a bandwidth 

of e = These statements assume that we can exactly solve the SDEs 

driving the forward and the reverse processes. Otherwise, the error induced by, say, 
the Euler scheme, will be added. 

The structure of the construction of the forward-reverse representation (1.6) and 
later of the corresponding Monte Carlo estimator in [3] strongly suggests that the 
forward-reverse approach does not rely on the continuity of diffusion processes, but 
merely on the Markov property. Hence, the approach was generalized to discrete time 
Markov chains in [2] and is generalized to the case of continuous time Markov chains 
with discrete state space in the this work. 

For a literature review on computational algorithms for computing conditional 
expectations of functionals of diffusion processes we refer to [3]. 

2. Expectations SRN-Bridge Functionals. In this section, we derive the dy¬ 
namics of the reverse paths and the expectation formula for SRN-brige functionals. 
The derivation follows the same scheme used in [21] , that is, i) write the master equa¬ 
tion, ii) manipulate the master equation to obtain a backward Kolmogorov equation 
and, iii) derive the infinitesimal generator of the reverse process. 

2.1. The Master Equation. Let be a SRN defined by the intensity-reaction 

pairs Let p{t,x,s,y) be its transition probability function, i.e., 

p{t,x,s,y):=P (X{s)=y\ X{t)=x) where x,y G and 0<t<s<T. The function p 
satisfies the following linear system of ODEs known as the master equation [9, 27, 31]: 

(2 1) I - i^j)p{t,x,s,y - Vj) - aj{y)p{t,x,s,y)), 

I P{t,X,t,y) =6a; = y, 

where 6 is the Kronecker delta function. 

A general analytic solution of (2.1) is in general computationally infeasible. Even 
numerical solutions are infeasible for systems with infinite or large number of states. 
For continuous state spaces, (2.1) becomes a parabolic PDE known as the Fokker- 
Planck Equation. Next, we derive the generator of the reverse process in the SRN 
setting. 

2.2. Derivation of the Reverse Process. Let us consider a fixed time interval 
[t, T], For s G [t, T] and x,y G Z'^, let us define v{s, y) := g{x)p{t, x, s, y) provided 

^In the SRN setting, the probability is positive, since the state space is discrete. 



that the sum converges. We remark here that v cannot in general be interpreted as an 
expectation of g. Indeed, while x,s,y) = 1, the sum over x could, in principle, 

even diverge. Hence, it is not a priori clear that v admits a stochastic representation. 
However, multiplying both sides of the master equation (2.1) by g{x) and summing 
over X, we obtain: 

(0 0] f 9sv{s,y) =J2j=iiajiy-’^j)v{s,y-Vj)-aj{y)v{s,y)), 

\ v{t,y) =g{y). 


Now, let us consider a time reversal induced by a change of variables s = T-\-t — s 
with {;(s, y) := v{T + t — s,y) = v{s, y) leading to the following backward equation: 


(2.3) 


f -dsv{s,y) = (ajiy - >^j)v(s,y- vj) - aj{y)v{s,y )), t<s<T, 

I v{T,y) =v{t,y) = g{y). 


Let Vj := —Vj. By adding and subtracting the term aj{y + Uj)v{s,y), we can 
write the first equation of (2.3) as 


,7 

dsv{s,y) + ^ {aj{y + Vj) {v{s,y + Vj) - v{s,y)) + {aj{y + Vj) - aj{y)) v{s,y)) = 0. 
7 = 1 


As a consequence, the system (2.3) can be written as 

(2 4) I '^5^(®’y) + S/=ia7(2/ + '^i)(i^(Ay + i^7)--5(s,y)) + c(y)'D(s,y) = 0, 

I v{T,y) = g{y), 


where c{y) := J2j=i ^jiy + Vj)-aj{y). 

Let us now define aj{y) := aj{y + Vj) and substitute it into (2.4). We have arrived 
at the following backward Kolmogorov equation [29] for the cost-to-go function v{s, y), 


(2.5) 


f dsv{s, y) + {y) {v{s, y + Vj)- v{s, yj) + c{y)v{s, y) = 0, 

1 v(T,y) = g{y). 


We recognize in (2.5) the generator ('D)(s, y) := ^j(y) 2/ + ~ y)) 

that defines the so-called reverse process Y = {K(s, by 

(2.6) P {Y{s + ds) = y + Vj \ Y(s) = y) = dj{y)ds 
or equivalently by, 

(2.7) P (y(s -I- ds) = y — Vj \Y{s) = y) = aj{y — Vj)ds. 

The Feynman-Kac formula [29] provides a stochastic representation of the solution 
of (2.5), 


( 2 . 8 ) 


v{s,y) = E 


5 (r(r))exp 


c{Y{s))ds 


Y{S)=y 


Notice that K is a SRN in its own right. We note in passing that stochastic representa¬ 
tions based on shifted evaluations of the propensities have been derived independently 
in [18, 17] to estimate variations and differences of the cost to go function. 
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2.3. The Forward-Reverse Formula for SRN. Let us consider a time inter¬ 
val [s,t] and assume that we only observe the process X on the end points, i.e., that 
we have X(s) = x and X{t) = y for some observed values a:, y S Fix an inter¬ 
mediate time s<t*<t, which will be considered a numerical input parameter later on. 
Denote by X^^'> the process X conditioned on starting at X^f\s) = x and restricted 
to the time domain 

Furthermore, let Y denote the reverse process constructed in (2.6) on the time 
domain [t*,t] (i.e., inserting t* for t and t for T in the above subsection) started at 
Y{t*) = y. As noted above, Y is again an SRN with reaction channels 
For convenience, we also introduce the notation X^^'^ for the process Y run backward 
in time, i.e., we define X^'^\u):=Y(t* +t—u) for u€[t*,t], and notice that X^^\t) = y. 

Recall that we aim to provide a stochastic representation^ i.e., a representation 
containing standard expectations only, for conditional expectations of the form, 

(2.9) nix, y) = E [$ (A, [s, t]) \ A(s) = a;, A(f) = y], 

for $ mapping Z^|.-valued paths to real numbers. Obviously, d) needs to be integrable 
in order for T-L to be well defined, and we shall also assume polynomial growth con¬ 
ditions on $ and its derivatives with respect to jump-times of the underlying path. 
Moreover, we assume that p{s,x,t,y) > 0. Once again, the fundamental idea of the 
forward-reverse algorithm of Bayer and Schoenmakers [3] is to simulate trajectories 
of X^^^ and (independently) of and then look for any pairs that are “linked”. 
Since the state space is now discrete, we may, in principle, require exact linkage in 
the sense that we may only consider pairs such that X^^^t*) = X^^^t*). However, 
in order to decrease the variance of the estimator, it may once again be advantageous 
to relax this condition by introducing a kernel. 

By a kernel, we understand a function k : Z^^ —>■ M satisfying 

= 1 - 

Moreover, we call k a kernel of order r > 0 if, in addition, 

x°"k.[x) = 0 


for any multi-index a with 1 < |a| < r, a := ai -\— • -I- ad, and x°‘ := x^^ ■ ■ ■ x'^’^, a G 
{0,1,2,...}. For instance, any non-negative symmetric kernel has order r = 1 in this 
sense. 

Having fixed one such kernel k, we define a whole family of kernels k^, indexed 
by the bandwidth e > 0, by 


Keix) = C^K 



with the constant being defined by the normalization condition ^«(^) = 1- 

Here, we implicitly assume the kernel, k, to be extended to for instance in a 
piecewise constant way. As we necessarily have k{x) —)■ 0 as |a;| —>■ oo, it is easy to see 
that we have the special case 


Ko{x) = 


1, a; = 0, 
0, a; 7 ^ 0. 
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Remark 2.1. The Kronecker kernel kq can also be realized as kq = for some 
eo > 0, which will depend on the base kernel k, provided that the base kernel k has 
finite support. 

Theorem 2.1. Let ^ be a continuous real-valued functional on the space of 
piecewise constant functions defined on [s,t] and taking values in (w.r.t. uniform 
topology) such that both TL and the right hand side of (2.10) is finite for any e. With 
Kg, and as above, we have 

( 2 . 10 ) 

E [a> (X(/) oXW, [s,t]) Kg(X(/)(t*) - xW(r))T [t*,t])] 

g™ E[Kg(X(/)(i*)-XW(t*))4'(xW,[i*,i])] 

where X^^^ o X^^'l denotes the concatenation of the paths X^^'^ and X^^'> in the sense 
defined by 


and 


X(/) 


X^^'> (u), s < u < t*, 
X^^'^ {u), t* < u < t, 


^'(Z, [a, 6]):= exp I / c[Z{u))du 


Remark 2.2. In line with Remark 2.1, we note that we could easily have avoided 
taking limits in Theorem 2.1 by replacing Kg with kq everywhere in (2.10). At this 
stage we note that the Monte Carlo estimator based on (2.10) with positive e will have 
considerable smaller variance than the version with e = 0, potentially outweighing the 
increased bias. 

Proof. [Sketch of proof of Theorem 2.1] For simplicity, we assume that the kernel 
K has finite support and that the functional $ is uniformly bounded. We will prove 
convergence of the numerator and the denominator in (2.10) separately. Let us, hence, 
prove the more general case first, i.e., the convergence 

(2.11) h(a;, y):='H{x, y) p{s, x, t, y) = 

$ o x^'^\ [s,t]) Kfix^f\t*) - x('’)(r))T [t*,t]^ . 

In the first step, we assume that $(Z, [s, fj) only depends on the values of Z on a 
fixed grid, say s = to < ti < ■ ■ ■ < t^ = t, i.e., 

$(Z, [s,t]) = g (Z(to), • ■ •, Z{tn)). 

Then (2.11) is proved (with minor modifications) in [3] (Theorem 3.4). Indeed, a 
closer look at that proof reveals that only Markovianity of X is really used. 

Furthermore, note that any continuous functional $ can be approximated by 
functionals d>„ depending only on the values of the process on a (ever finer) finite grid 
to,... ,tn. As, on the one side, 

h{x,y) = E[$(A, [s,t]) \ X{s) = x, X{t) = y]p{s,x,t,y) = 

lim E[$„ (A, [s,t]) I A(s) = x, X{t) = y\p{s,x,t,y) 


lim E 

e-)-0 
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and, on the other side, 


lim lim E 

e—>-0 n—>-<50 




lim E 

e-J-O 


We are left to prove that 




'X^f'> o X''^\ [s,t]) K^{X^f\t*) - X^'^\t*))'^ (^X^'>\ = 

o x^^\ [s,t]) K,{x^f\t*) - x^'>\t*))m (xw, [r,t])], 

which follows as Kq = ^^^o some eg > 0. In fact, it even follows in the general case 
by dominated convergence. 

Finally, the proof of convergence of the denominator is a special case of the proof 
for the numerator, and therefore, the convergence of the fraction follows from the 
continuity of (a, h) a/h for 6 > 0. □ 

3. The EM Algorithm for SRN. In this section, we present the EM algorithm 
for SRN, which is the main step for computing the parameter estimation. First, we 
explain the EM algorithm in general, and then, we derive the log-likelihood function 
for a fixed realization of the process, X. Einally, we present the EM algorithm for 
SRN. 

3.1. The EM Algorithm. The EM algorithm [7, 28, 33, 20] its named from 
its two steps: expectation and maximization. It is an iterative algorithm that, given 
an initial guess and a stopping rule, provides an approximation for a local maxi¬ 
mum or saddle point of the likelihood function, lik(0 \'D). It is a data augmentation 
technique in the sense that the maximization of the likelihood lik(0 | D) is performed 
by treating the data I? as a part of a larger data set, {V.V), where the complete- 
likelihood, lik'^(0 I 'D,'D), is amenable to maximization. Given an initial guess the 
EM algorithm maps into by the 


lim lim E 

e—>-0 n—>-oo 




lim lim E 

n—>-oo e—vO 


1. expectation step: Qg(p){9 \'D) log(lik‘^(0 | H, I?)) 

2. maximization step: := argmaxg (3g(p) (0 | H). 


V 


and the 


Here, Eg(p) ['ll?], denotes the expectation associated with the distribution of V under 
the parameter choice conditional on the data, H. In many applications, the ex¬ 
pectation step is computationally infeasible and Qg(p} {9 | V) should be approximated 
by some estimate. 


Qe<.p) I ■“ Eg(p) 


log(lik^(6i|l?,f>)) |I? 


Remark 3.1 (The Monte Carlo EM). If we know how to sample a sequence of 
M independent variates ~ P 11?, with parameter 9 ^p\ then we can define the 

following Monte Carlo estimator of Qg(p){9 | D), 


Qeip) I 25) E I 25, Vf)). 

^ i=l 


In Section 4, we describe how to simulate exact and approximate samples of 
V\V. 
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3.2. The Log-Likelihood Function for Continuously Observed Paths. 

The goal of this section is to derive an expression for the likelihood of a particular 
path, (X(t,wo))tG[o,T]! of the process X, where wq G is a fixed realization. An 
important assumption in this work is that the propensity functions aj can be written 
as aj{x) = Cjgj{x) for j=l,..., J and x G where gj are known functionals and Cj 
are considered the unknown parameters. Define 0:=(ci,... ,cj). Let us denote the 

jump times of (X(t,uJo))te[o,T] in (0,T) by 6. ■ • • >Civ-i- Define := 0, ^jv := T 

and for i = 0,1,..., N - 1. 

Let us assume that the system is in the state xg at time 0. We have that is 
the time to the first reaction, or equivalently, the time that the system spend at xg 
(sojourn time or holding time at state xg). Let us denote by the reaction that 
takes place at ^i, and therefore, the system at time is in the state Xi := Xg + r'fj. 
From the SSA algorithm, it is easy to see that the probability density corresponding 
to this transition is the product (xg) exp (—ao(a;o)A^o)- 

By the Markov property we can see that the density of one path ((^i, is 

given by 


N-l 

(3.1) n (a:i_i)exp(-ao(xi_i)A^i_i) x exp (-ao(a;Ar-i)A^Ar_i). 

i=l 

The last factor in (3.1) is due to the fact that we know that the system will remain 
in the state Xn-i in the time interval [^ 7 v-i,P). 

Rearranging the factors in (3.1), we obtain 


(3.2) 


exp - 



N-l 


1 —l)- 


2 = 1 


Now, taking logarithms in (3.2) we have 


N-l N-l 

- ^ ao(x,)A^, + ^ log(a^^. (x^_i)), 

2=0 2=1 

which by the definition of ao can be written as 

N-l J N-l 

-EE ^j(^ 0A^2 “h ^ ^ (Xi_i)). 

2^0 j — 1 2=1 

Interchanging the order in the summation and denoting the number of times that the 
reaction Uj occurred in the interval [0,T] by we have 

J / N-l \ N-l 

(3 !>) E -■=. E gj{xi)A£,i + \og{cj)Rj^yg,T\ j + log(gi.j^ (a;i_i)). 

3 = 1 \ 1=0 / i=l 

Observing that the last term in (3.3) does not depend on 9, the complete log-likelihood 
of the path (X(t,a;o))tg[o,T] is up to constant terms given by 

,7 

■= -OC.[o.t], with 6'=(ci,...,cj), 

7 = 1 


(3.4) 
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where ■= gjixo)^^o + ■ ■ ■ + gj{X{s)) ds. The last 

equality is due to gj being piecewise constant in the partition ■ ■ ■; ^n}- 

Now let us assume that we have a collection of intervals, {Ik = [sk,tk])k=i C [0, T], 
where we have continuously observed the process {X{t, ■))teik at each Ik- We define 
the log-likelihood function as: 


K 


t{e) :=^ log(c,)^i?,y, - 


i=i 


fc=i 



Remark 3.2. Note that Rjj^ and Fjj^ are random variables, which are functions 
of the full paths of X but not of the discretely observed paths. Hence, they are random 
given the data T> as defined in (1.1). 


3.3. The EM Algorithm for SRNs. According to the Section 3.1, for a par¬ 
ticular value of the parameter 9, say 9^p\ we define 

j / K K \ 

Qeiv) (ci,..., Cj I D) := ^ ( ( log(cj) ^ ( Eg(p) \ 2?] — Cj ^ ( Eg(p) \FjJ^, | 2^] 1 j 

3 — 1 \ k—1 k—1 / 

where Eg(p) [Rjj,^ | X>] = E^cp) [Rjj,, \ X{sk)=x{sk),X{tk)=x{tk)\ (by the Markov 
property), and analogously for Fjg,,. 

Now consider the partial derivatives of Qqm {ci,... ,cj \ 'D) with respect to Cj 


dcj Qe(p) (ci, ■ • ■, cj I T>) 


— ^ Eg(p) [Rjj,, I I ■ 

k=i k=i 


Therefore, XQg(p) (ci,..., cj 11?) = 0 is obtained at 9* = (c^,..., c}) such that 


(3.5) 


StiEe(P) [Rj,i,\R] 
Ef=iE,(p) [F,,I,\V]' 


This is clearly the global maximization point of the function Qg(p){- \ T)). 

The EM algorithm for this particular problem generates a deterministic sequence 
{9^P'^)p^ that starts from a deterministic initial guess 9^^'> provided by phase I (see 
Section 4.1) and evolves by 


/o c'l „(p+i) ^ EflCp) [RjJk I 

EtrE,<p,[F,,JP]’ 

where 9<-p^ = (c^^\ ..., 

4. Forward-Reverse Monte Carlo EM Algorithm for SRNs. In this sec¬ 
tion, we present a two-phase algorithm for estimating the parameter 9. Phase I is 
deterministic while phase II is stochastic. We consider the data, T>, as given by (1.1). 
The main goal of this section is to provide a Monte Carlo version of formula (3.6). 
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4.1. Phase I: using Approximating ODEs. The objective of phase I is to 
address the key problem of finding a suitable initial point 9^^^ to reduce the variance 
(or the computational work) of phase II, thereby increasing (in some cases dramati¬ 
cally) the number of SRN bridges from the sampled forward-reverse trajectories for 
all time intervals. 

Let us now describe phase I. From the user-selected seed, 9^p, we solve the fol¬ 
lowing deterministic optimization problem using some appropriate numerical iterative 
method: 


(4.1) 


9 II 


:= argmin > Wk 
e>o , 




Here, Z^f^ is the ODE approximation defined by (1.5) in the interval to the 

SRN defined by the reaction channels, and the initial condition x{sk); 

Zir) 

is the ODE approximation in the interval to the SRN defined by the 

reaction channels, ((—dj))/=i, and by the initial condition x{tk). Let us recall that 
in Section 2.2, dj{x) was defined as aj(x—Vj). We define Z^’^\u, 9):=Z^^'' {tl+tk—u, 9) 
for u G Furthermore, Wk-={tk—Sk)~^ and |i-|j is the Euclidean norm in The 

rationale behind this particular choice of the weight factors is based on the mitigation 
of the effect of very large time intervals, where the evolution of the process, X, may 
be more uncertain. A better (but more costly) measure would be the inverse of the 
maximal variance of the SRN bridge. 

Remark 4.1 (An alternative definition of 9^^). In some cases, convergence 
issues arise when solving the problem (4.1). We found it useful to solve a set of 
simpler problems whose answers can be combined to provide a reasonable seed for 
phase II: more precisely, we solve K deterministic optimization problems, one for 
each time interval [sfc,tfc]; 


Afc := argmin 
e>o 


Z^f\tl-9) - Z^^\tl-9) 


all of which were solved iteratively with the same seed, 9 


(0) 

7 


Then, we define 


(4.2) 


i(o) Sfc 

■“ Ek^^k • 


4.2. Phase II: the Monte Carlo EM. In our statistical estimation approach, 
the Monte Carlo EM Algorithm uses data (pseudo-data) generated by those forward 
and backward simulated paths that result in exact or approximate SRN bridges. In 
Figure 4.1, we illustrate this idea for the wear example data presented in Section 6.2. 
Phase II implements the Monte Carlo EM algorithm for SRNs. 

4.2.1. Simulating Forward and Backward Paths. This phase starts with 
the simulation of forward and backward paths at each time interval Ik, for k=l ,..., K. 
More specifically, given an estimation of the true parameter 9, say, 0 = (ci, C 2 ,..., cj), 
the hst step is to simulate Aik forward paths with reaction channels {vj, Cjgj(x))j^i 
in all of them starting at Sk from x(sk) (see Section 5.1 for details about 

the selection of Mk). Then, we simulate Mk backward paths with reaction channels 
{-iyj,Cjgj{x - ’^j))j=i in [tl,tk], all starting at tk from x{tk). Let {tl,Cjm))mLi 
and (!('>) (t* , Wm'))m'Ci denote the values of the simulated forward and backward 
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0.25 0.3 0.35 0.4 0.45 0.5 

Time 


Fig. 4.1. Left: Illustration of the forward-reverse path simulation in Phase II. The plot cor¬ 
responds to a given interval for the wear data, presented in Section 6.2. The observed values are 
marked with a black circle (beginning and end of the interval). On the y-axis we plot the thickness 
process X(t), derived from the wear process of the cylinder liner. Observe that every forward path 
that ends up at a certain value will be joined with every backward path that ends up at the same 
value when using the Kronecker kernel. For example, this happens at value 58, where several forward 
paths end and several backward paths start. Right: Zoom near value 58. 


paths at the time respectively. If the intersection of these two sets of points is 
nonempty, then, there exists at least one m and one m' such that the forward and 
backward paths can be linked as one SRN path that connects x{sk) and x{tk) data 
values. 

When the number of simulated paths Mj. is large enough, and an appropriate 
guess of the parameter 9 is used to generate those paths, then, due to the discrete 
nature of our state space we expect to generate a sufficiently large number of 
exact SRN bridges to perform statistical inference. However, at early stages of the 
Monte Carlo EM algorithm, our approximations to the unknown parameter 9 are not 
expected to provide a large number of exact SRN bridges. In such a case, we can 
use kernels to relax the notion of an exact SRN bridge (see Section 2.3). Notice that 
in the case of exact SRN bridges, we are implicitly using a Kronecker kernel in the 
formula (2.10), that is, k takes the value I when and 0 

otherwise. We can relax this condition to obtain approximate SRN bridges. 

To make computationally efficient use of kernels, we sometimes transform the 
endpoints of the forward and backward paths generated in the interval Ik, 

(4.3) A-fe := {X(fjtl,dj,),x^fjtld;2), •.., (4,ccmJ, 

X^^HtluJM, + l),X^^\tl,CjM fc+ 2 ), • ■ ■, X(-^'>{tlC02Mj), 

into 

(4.4) HiXk):= {t *,, c5i), (^, ),..., (^, ccmJ, 

fW(f:,c5M.+i),EW(4,(iM.+2),...,M('')(4,<i2Mj), 

by a linear transformation H with the aim of eliminating possibly high correlations 
in the components of Xk- The original cloud of points Xk formed by extremes of the 
forward and backward paths is then transformed into H{Xk), which hopefully has a 
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covariance matrix close to a multiple of the d-dimensional identity matrix aid- Ideally, 
the coefficient a should be chosen in such way that each d-dimensional unitary cube 
centered at Y^^\t*f.,Cbrn) contains on average one element of Note 

that this transformation changes (generally slightly) the variances of our estimators 
(see Section 5.3 for details about the selection of a and If). 

In our numerical examples, we use the Epanechnikov kernel 


( 4 . 5 ) 


k(v) ■= 



d 

di )l{|7)i |<1} ) 

i=l 


where 77 is defined as 

( 4 . 6 ) 77 = 77^(777,777') := - F('’)( 4 ,w„/). 


This choice is motivated by the way in which we compute r]k{m,m') avoiding 
whenever possible to make calculations. The support of k is perfectly adapted to 
our strategy of dividing into unitary cubes with vertices in 

4.2.2. Kernel-weighted Averages for the Monte Carlo EM. As we pre¬ 
viously mentioned, the only available data in the interval Ik correspond to the ob¬ 
served values of the process, X, at its extremes. Therefore, the expected values 
Eg(p) [Rjjk I H] and Eg(p) | V] in the formula (3.6) must be approximated by 

SRN-bridge simulation. To this end, we generate a set of Mk forward paths in the 
interval Ik using as the current guess for the unknown parameter Having 

generated those paths, we record Rj j^{u}rn) and for all j = 1, 2,..., J and 

777 = l,2,...,Mfe as defined in Section 3.2. Analogously, we record R^J‘]^{Cjm') and 
(wm') for all j = 1, 2, ..., J and m' = 1, 2, ..., Mk- 

Consider the following k- weighted averages, where k = for an appropriate 
choice of bandwidth e that approximate Eg(p) [Rjj^ \ F] and E^kp) [Fjj^, | P], respec¬ 
tively: 

n{r]k{m, m'))i;k{m') ^ 

Em,m' K{r]k{m,m'))'tPk{m') 

Em,m' K(??fc("l, 777'))7/^fc(777') 

where rjf^{m,m!) has been defined in (4.6) and m,m' = 1,2,..., Mk and '^^(to') := 
J Cj(AW(s ,uirn'))ds^ (according to Theorem 2.1). Observe that we generate 
Mk forward and reverse paths in the interval R but we do not directly control the 
number of exact or approximate SRN bridges that are formed. The number Mk is 
chosen using a coefficient of variation criterion, as explained in Section 5.1. In Section 
5.2, we indicate an algorithm to reduce the computational complexity of computing 
those K-weighted averages from O(M^) to 0{Mk\og{Mk))- 

Finally, the Monte Carlo EM algorithm for this particular problem generates a 
stochastic sequence staring from the initial guess provided by phase I 



( 4 . 7 ) 

{dijjk I ■= 
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(4.1) and evolving by 


(4.8) 


X]fc=i i^jjk I 

X]fe=i I 


where ..., ■ In Section 5.4, a stopping criterion based on techniques 

widely used in Monte Carlo Markov chains is applied. 

5. Computational Details. This section is intended to show computational 
details omitted in Section 4. Here, we explain why and how we transform the clouds 
Xk consisting of endpoints of forward and reverse paths in the time interval Ik at 
the time for k=l, Then, we explain how to chose the number of simulated 

forward and backward paths, Mk, in the time interval Ik to obtain accurate estimates 
of the expected values of Rjj^ and for j = 1, 2,..., J. Next, we show how to 
reduce the computational cost of computing approximate SRN bridges from 0(M|) to 
0{Mk log(Mfc)) using a strategy introduced by Bayer and Schoenmakers [2]. Finally, 
we indicate how to choose the initial seeds for phase I and a stopping criteria for phase 
II. 


5.1. On the Selection of the Number of Simulated Forward-Back-ward 
Paths. The selection strategy of the number of sampled forward-backward paths, 
Mk, for interval Ik, is determined by the following sampling scheme: 

1. First sample M forward-reverse paths (in the numerical examples we use 
M=100). 

2. If the number of joined forward-reverse paths using a delta kernel is less than 
a certain threshold, 7 , we transform the data as described in Section 5.3. 
This data transformation allows us to use the Epanechnikov kernel (4.5). In 
this way, we are likely to obtain a larger number of joined paths. 

3. We then compute the coefficient of variation of the sample mean of the sum 
of the number of times that each reaction j occurred in the interval Ik, 

Here 

and the coefficient of variation of the sample mean of the sum = 

Jj^gj{X^P{s))ds. Further details can be found in Section 3.2. The coef¬ 
ficient of variation (cv) of a random variable is defined as the ratio of its 
standard deviation a over its mean /i, cu := In this case, for the reaction 
channel j in the interval Ik, we have: 


and 


c^'^’(4, j) = L 


- 1/2 i^rn)', Lk) 


where S{Y; L):=A{Y'^; L) — A{Y; L)^ is the sample standard deviation of the 
random variable Y over an ensemble of size L and A{Y ; L):=^ H i^m) 

is its sample average. Here Lk denotes the number of joined paths in the 
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interval k, which is bounded by M^. In the case that Lk is small, we compute 
a bootstrapped coefficient of variation. 

The idea is that by controlling both coefficients of variation, we can control 

''(p) 

the variation of the p-th iteration estimation Of/. Our numerical experiments 
confirm this fact. 

4. If each coefficient of variation is less than a certain threshold then the sam¬ 
pling for interval Ik finishes, where Mk is the total number of sampled paths, 
and accepting the quantities in step 3. and also the quantities K(r]k{m, 
m,m' = 1,...,L as defined in Section 3.2. Otherwise, we sample additional 
forward-reverse paths (increasing the number of sampled paths at each iter¬ 
ation M) and go to step 2. 

This selection procedure is implemented in Algorithm 1 . 

5.2. On the Complexity of the Path Joining Algorithm. In this section, 
we describe the computational complexity of Algorithm 2 for joining paths in phase 
II, and show that this complexity is 0{M\og{M)) on average. 

Let us describe the idea. First, fix a time interval Ik and a reaction channel j. 
We use the following double sum as an example, 

M M 

Krn,rn'■ 

m—1 m' — l 

A double sum like this one appears in the numerator of (4.7). Instead of computing a 
double loop which always takes O(M^) steps (and many of those steps contribute 0 to 
the sum), we take the following alternative approach: let Bi] be the smallest 

hyperrectangle of sides [A^, Bi], i = 1,..., d that contains the cloud y, defined in (4.4). 
Let us also assume that Ai,Bi, i = 1,..., d are integers. The length Bi — Ai depends 
on how sparse the cloud is in its i-th dimension. Given the cloud, it is easy to check 
that the values Ai,Bi, i = I,...,d can be computed in 0{M) operations. Now, we 
subdivide the hyperrectangle into sub-boxes of size-length I, with sides parallel to the 
coordinate axis. 

Since we have a finite number of those sub-boxes, we can associate an index for 
each one in such a way that it is possible to directly retrieve each one using a suitable 
data structure (for example an efficient sparse matrix or a hash table). The average 
access cost of such structure is constant with respect of M. For each sub-box, we 
associate a list of forward points that ended up in that sub-box. It is also direct to 
see that the construction of such a structure takes a computational cost of M steps 
on average. Then, instead of evaluating the double sum which has 0{M‘^) steps, we 
evaluate only the non zero terms. This is because, when a kernel k is used, k{x, j/) 7 ^ 0 
if and only if x and y are situated in neighboring sub-boxes. That is, 

M M 

E E 

m—1 m' — l 

M 3"^ n(bi) 

= EEEHi (W£(Z)) + Rfj^{Cbm')^ 

m' — l i—1 1 — 1 

where n{bi) is the total quantity of reverse end points associated with the i-th neighbor 
of the sub-box to which the forward end-point, belongs, whereas i{l) 
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indexes one of those reverse end points. Note that the constant of this complexity 
depends exponentially on the dimension 

The cost that dominates the triple sum on the right-hand side is the expected max¬ 
imum number of reverse points that can be found in a sub-box. This size can be proved 
to be 0(log(M)), which makes the whole joining algorithm of order 0{M\og{M)). 
For additional details we refer to [3]. 

5.3. A Linear Transformation for the Epanechnikov Kernel. Our nu¬ 
merical experiments show that clouds formed by the endpoints of simulated paths, A, 
usually have a shape similar to the cloud Z shown in the left panel of Figure 5.1. 

It turns out that partitioning the space into d-dimensional cubes with sides par¬ 
allel to the coordinate axis is not idle for selecting kernel domains and consequently 
for finding SRN bridges. It is more natural way to divide the space into a system of 
parallelepipeds with sides parallel to the principal directions of cloud Z having sides 
proportional to the lengths of its corresponding semi-axes to use as supports for our 
kernels. 

Another way of proceeding (somehow related but not totally equivalent) is to 
transform the original cloud Z to obtaining another cloud T{Z) with a near-spherical 
shape. Then, scale it to have on average one point of the cloud in each d-dimensional 
cube (with sides parallel to the coordinate axis). In this new cloud, H{Z), we can 
naturally find neighbors using the algorithm described in Section 5.2 below and we 
have the Epanechnikov kernel to assign weights. This is why in Section 4 we wanted to 
transform the data Xk into an isotropic cloud, such that, every unitary cube centered 
in ,(!)(„) contains, on average, one point of the cloud 

We will now describe the details of the aforementioned transformations. 

First, we show a customary procedure in statistics to motivate the transformation. 
Let E := cov(2^) be the sample covariance matrix computed from a cloud of points 
Z. To obtain a de correlated version of Z, the linear transformation T{z) = z 

is widely used in statistics. For example, consider a cloud Z of points obtained by 
sampling 10^ independent highly correlated bi-variate Gaussian random variables. 
The corresponding cloud T{Z), depicted in the right panel of Figure 5.1, shows the 
aspect of a sphere with a radius 3 units. The next step is to obtain a radius a 



Fig. 5.1. Left: A bivariate Gaussian cloud, Z. Right: Rs corresponding decorrelated and scaled 
version T{Z). 


such that the volume of a d-dimensional sphere of radius 3a equals to the volume 
of M unitary d-dimensional cubes. From the equation M = {3aY Vd, we obtain 




20 



Fig. 5.2. Cloud H[Z). 


a = I where Vd = r(d/ 2 +i) volume of the unitary sphere in 

Therefore, the linear transformation H is defined by H{x) := aT(x). The result of 
this transformation is depicted in Figure 5.2 in our Gaussian example. 

In general, we do not expect to have a Gaussian-like distribution for , however 
it seems to be a good approximation in our numerical examples. At this point, it is 
worth mentioning that in examples with several dimensions (species), the number of 
approximate SRN bridges we get by using the transformation may be of the order 
of . This indicates that the bandwidth is too large, and consequently the bias 
introduced in the estimation may be large. In these cases, we expand a by a factor of 
1.5, for example, until 0{M) approximate bridges are formed. Generally, one or two 
expansions are enough. 

A motivation for the Gaussian approximation is that, for short time intervals and 
in certain regimes of activity of the system, specially where the total propensity, oq, is 
high enough, a Langevin approximation of our SRN provides an Ornstein-Uhlenbeck 
process, which can potentially be close in distribution to our SRN (see [22]). 

Remark 5.1. According to the transformation H, the kernel used in our case is 
approximately equal to 

where k is the Epanechnikov kernel defined in (4.5), since it corresponds with the 
continuous case and not with the lattice case. 

Remark 5.2. We can even consider a perturbated version of T, say Tc, by 
adding a multiple of the diagonal matrix formed by the diagonal elements ofE, i.e., 
Tc = (S + c diag{'E))~^/‘^, where c is a positive constant of order 0(1). The linear 
transformation Tc can be considered as a regularization of T that does not change the 
scale of the transformation T. 

5.4. On the Stopping Criterion. A well-known fact about the EM algorithm 
is that, given a starting point, it converges to a saddle point or a local maximum of 
the likelihood function. Unless we know beforehand that the likelihood function has 
a unique global maximum, we cannot be sure that the output of the EM Algorithm is 
the MLE we are looking for. The same phenomenon occurs in the case of the Monte 
Carlo EM algorithm, and for that reason Casella and Robert in [28] recommend 
generating a set of N (usually N around five) parallel-independent Monte Carlo EM 
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sequences starting from a set of over dispersed initial guesses. Usually, we do not 
know even the scale of the coordinates of our unknown parameter 6 = (ci, C 2 ,..., c^). 
For that reason, we recommend running only phase I of our algorithm over a set of 
uniformly distributed random samples drawn from a d-dimensional hyperrectangle 
nf=i(0, Ci], where Ci is a reasonable, case dependent, upper bound for each reaction 
rate parameter c^. We observed in our numerical experiments that the result of this 
procedure is a number of points laying on a low dimensional manifold. Once this 
manifold is identified, N different initial guesses are taken as over dispersed seeds for 
phase II. 

Note that the stochastic iterative scheme given by formula (4.8) may be easily 
adapted to produce N parallel stochastic sequences, {0^u\)p=n where, for each i = 
1,2,... ,N, the distribution of the random variable depends on its history of 

realizations, only through its previous value, In this sense, the N 

sequences, are MCMC sequences [26, 28]. 

There is a number of convergence assessment techniques or convergence diagnostic 
tools in the MCMC literature; in this article, we adopt the R criterion by Gelman 
and Rubin [II, 10], which monitors the convergence of N parallel random sequences 
where i = l,2,...,N. 

Compute: 


N 


N 


N 


N -l ^ ~ ■= ■= Tv X! and 




N 


^p ^ ^ ®p.*> ®p.* ■= 11 {a ~ ^P’) ■ 

A — 1 1 _1 




Then define 
(5.1) 


1 ;:= 


- Wp + Bp 

P 


and Rp 



B and W are known as between and within variances, respectively. It is expected that 
R (potential scale reduction) declines to 1 as p —>■ -l-oo. In our numerical experiments 
we use 1.4 as a threshold. 

Observe that if for all p, the values 'ipp^i are grouped in a very small cluster, i.e., 
'^p,i ~ '0P and therefore we have essentially only one Markov chain, then Bp is close 
to zero and Rp « —>■ 1 as p —>■ -l-oo independently of the behavior of the chain. 

To avoid this undesirable situation, we propose to observe also the behavior of the 
moving averages of order L, that is, 


(5.2) 


ipp := 


1 ^ 

vS 

2=1 


(i^p,i - where 


1 

L 




e=o 


We stop when 'ijjp is sufficiently small. 

Once we stop to iterate after p* iterations, the individual outputs 


nip*) nip*) 

^ 11,1 ’ ^ 11,2 I • ■ 


nip*) 

> '’n,N 
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form a small cluster. Although we cannot be certain that this cluster is near the MLE, 
we do have at least some confidence. Therefore, we can use the mean of this small 
cluster as a MLE estimation of our unknown parameter, 0. Otherwise, if we have two 
or more clusters or over dispersed results, we should perform a more careful analysis. 

Remark 5.3. The R stopping criterion only works if the over dispersed seeds 
obtained in phase I lie in the basin of attraction of one local maximum of the likelihood 
function. Otherwise R may not decrease to 1, even worse, it may go to +oo. For 
that reason, it is recommendable to monitor the evolution of R. In our numerical 
examples we have that R is decreasing and we stop the algorithm using Rq = 1.4 as a 
threshold. 

6. Numerical Examples. In this section, we present numerical results that 
show the performance of our FREM algorithm. In phase I, we use the alternative 
definition of described in Remark 4.1. For phase II, we run N = 4 parallel 
sequences using 1.4 as a threshold for R (described in Section 5.4). The moving 
average order used in all numerical examples is L = 3 (see formula 5.2), and the 
associated tolerance is 0.05. As a point estimator of 6, we provide the cluster average 
of the sequence ^ ^ ^n- 

For each example, we report i) the number of iterations of phase II, p*; ii) a table 
containing a) the initial points, b) the outputs of the phase I, 9^^\, and c) the 

outputs of phase II, 9^j i', and iii) a Figure with all those values. 

For the examples wehre we generate synthetic data, we provide the seed parameter 
9g we used to generate the observations. It is important to stress that the distance 
from our point estimator to 9g depends of the number of generated observations. 

6.1. The Decay Process. We start with a simple decay model with only one 
species and two reaction channels. Its stoichiometric matrix and propensity function 
are: 


ly 


T 


-1 

-4 


and a{X) 


ciX 

C2X ■ 1{X>4} 


respectively. 


We set Ao=100, T=1 and consider synthetic data observed in uniform time inter¬ 
vals of size At=rT, This determines a set of 17 observations generated from a single 
path using the parameter 6 >g=( 3.78, 7.20). The data trajectory is shown in Figure 

6 . 1 . 

For this example, we use N=4 FREM sequences starting at 0[°j=(l, 5), 9f'l={6, 5), 

0 j°3 =(1,9), and 0j°]=(6, 9). In this and the following examples, for each interval we 
run a minimum of M = 100 forward-reverse sample paths and we set a coefficient of 
variation threshold of 0.1 (see Section 5.1). 

We illustrate one run of the FREM algorithm in the left panel of Figure 6.2 and 
in Table 6.1. For that run, the cluster average is 0=(3.68, 7.50), and it took p*=3 
iterations to converge for a R threshold equal to 1.4. We take 0 as a MLE point 
estimation of the unknown parameters. 

We computed an ensemble of 30 independent runs (and obtained 30 cluster av¬ 
erages). The result is shown in the right panel of Figure 6.2. We observe that the 
variability of the cluster average is indeed very small, indicating the robustness of the 
method and that 1.4 is a reasonable choice as a threshold for R. Details are shown in 
Table 6.2. 
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Fig. 6.1. Data trajectory for the decay example. This is obtained by observing the values of an 
SSA path at uniform time intervals of size At=l/16. 




Fig. 6.2. Left: One FREM estimation (phase I and phase II) for the decay example. The N 
final values of this particular run are shown as circles. Right: We show 30 independent runs of the 
FREM algorithm. 


i 


0 =^ 

3(0) 

^ILi 

^ 

v_- 

II 

o 

1 

(1, 5) 

(1.35, 

10.67) 

(3.65, 7.52) 

2 

(6, 5) 

(7.85, 

9.11) 

(3.80, 7.46) 

3 

(1, 9) 

(1.20, 

10.71) 

(3.63, 7.50) 

4 

(6, 9) 

(7.06, 

9.30) 

(3.65, 7.50) 


Table 6.1 

Values computed by one run of the FREM Algorithm for the decay example, corresponding to 
the left panel of Figure 6.2. 


Remark 6.1. Recall that the distance between the value 6 q used to generate 
synthetic data and the estimation 6 is meaningless for small data sets. The relevant 
distance in this estimation problem is the one we obtain from our FREM algorithm 9 
and the 9mle based on maximizing the true likelihood function however, the later is 
not available in most cases. 
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Average 

Average Cl at 95% 

Min Value 

Max Value 

Cl 

3.69 

(3.681, 3.699) 

3.66 

3.77 

C2 

7.50 

(7.495, 7.505) 

7.48 

7.51 


Table 6.2 


Values computed for an ensemble of 30 independent runs of the FREM algorithm for the de¬ 
cay example. In each run, we obtain a cluster average, fit*), as an MLE point estimate. Define 
. Eor each unknown coefficient Cj in 9, we show i) the average ofC, ii) a 95% confi¬ 
dence interval for the mean ofC, and in) the minimum and maximum values ofC. 


6.2. Wear in Cylinder Liners. We now test our FREM algorithm by using 
real data. The data set w = taken from [14], consists of wear levels observed 

on n = 32 cylinder liners of eight-cylinder SULZER engines as measured by a caliper 
with a precision of A = 0.05 mm. Data are presented in Figure 6.3. 


Observed wear process 



Fig. 6.3. Data set from [14]. Data refer to cylinder liners used in ships of the Grimaldi Group. 


The finite resolution of the caliper allows us to represent the set of possible mea¬ 
surements using a finite lattice. Let X(t) be the thickness process derived from the 
wear of the cylinder liners up to time t, i.e., X{t) = Xq — W{t), where W is the wear 
process and Xq is the initial thickness. The final time of some observations is close 
to r=60, 000 hours. 

We model X{t) as a decay processes with two reaction channels and A = 0.05, 
since a simple decay process is not enough to explain the data. The two considered 
intensity-jump pairs are (oi(a:),i^i) = (cix,—A) and {a 2 {x),V 2 ) = (c2X, —4A). Here, 
Cl and C2 are coefficients with dimension (mm • hour)“^. 

The linear propensity functions, the value Ao=5 mm and the initial values for 
phase I: 0®=(1,1), = 1), 0|°3=(1,1O) and d[°]=(10,10), are motivated by 

previous studies of the same data set (see [22] for details). 

In our computations, we re scaled the original problem by setting A=1 and r=l. 

We illustrate one run of our FREM algorithm in the left panel of Eigure 6.4 and 
in Table 6.3. For that run, we obtained a cluster average of 0=(8.91, 5.74), which 
corresponds to 0o=(1.5 • 10“^, 0.97 • 10“^) in the non scaled model. The algorithm 
converged after p*=93 iterations using 1.4 as a threshold for R. We take that cluster 
average as an MLE point estimation of the unknown parameters. 
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Fig. 6.4. Left: FREM estimation (phase I and phase II) for the wear example. The N final 
values of this particular run are shown as circles. Right: We show 30 independent runs of the 
FREM algorithm. 


B=e 


(0) 




0=e(fj 


1 (1, 1) (2.81, 9.90) (8.56, 5.83) 

2 (10, 1) (36.88, 1.58) (9.07, 5.71) 

3 (1, 10) (1.13, 10.31) (8.68, 5.80) 

4 (10, 10) (11.44, 7.79) (9.34, 5.62) 

Table 6.3 

Values computed by one run of the FREM algorithm for the wear example corresponding to the 
left panel of Figure 6 . 4 . 


We computed an ensemble of 30 independent runs (and obtained 30 cluster aver¬ 
ages). The result is shown in the right panel of Figure 6.4. We observe that there is 
a small variability in the estimates indicating the robustness of the method. Details 
are shown in Table 6.4. 



Average 

Average Cl at 95% 

Min Value 

Max Value 

Cl 

8.94 

(8.90, 8.98) 

8.71 

9.22 

C2 

5.73 

(5.72, 5.74) 

5.66 

5.79 


Table 6.4 


Values computed for an ensemble of 30 independent runs of the FREM algorithm for the 
wear example. In each run, we obtain a cluster average, ©hi, as an MLE point estimate. De¬ 
fine C:=(6bl)?£j^. Eor each unknown coefficient Cj in 9, we show i) the average of C, ii) a 95% 
confidence interval for the mean of C, and Hi) the minimum and maximum values of C. 


Remark 6.2. In this particular example, the data set was obtained using a caliper 
with finite precision. Therefore, our likelihood should also incorporate the distribution 
of the measurement errors, which may be assumed Gaussian, independent, and iden¬ 
tically distributed with mean zero and variance equals to the caliper’s precision. We 
omitted this step in our analysis for the sake of simplicity and brevity. 

Remark 6.3. Comparing our FREM estimate, 0=(1.5TO“'^, 0.97-10“^), with the 
value obtained in [22] for the same data set and the same model, 0=(O.63 • 10“^, 1.2 • 
10“^), we obtained the same scale in the coefficients and a quite similar confidence 
band, see Figure 6.5. 
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Data and the 90% confidence band Data and the 90% confidence band 




Fig. 6.5. Left: confidence band with the parameter 9 obtained in [22] for the wear example. 
Right: the confidence band obtained with the FREM algorithm. 


6.3. Birth-Death Process. This model has one species and two reaction chan¬ 
nels: 




described by the stoichiometric matrix and the propensity function 

^ I ^ and a{X) = ^ ^ > respectively. 

Since we are not continuously observing the paths of X, an increment of size k in 
the number of particles in a time interval [ti, t 2 ] may be the consequence of any 
combination of n+k firings of channel 1 and n firings of channel 2 in that interval. 
This fact makes the estimation of ci and C 2 nontrivial. 

We set Xo=17, T=200 and consider synthetic data observed in uniform time 
intervals of size At=5. This determines a set of 41 observations generated from a 
single path using the parameter 0 g=( 1, 0.06). The data trajectory is shown in Figure 
6 . 6 . 

For this example, we ran 7V=4 FREM sequences starting at 0j°]=(O.5,0.04), 

0.08), 0j° 3=(1.5, 0.04), and 0j°4=(1.5,0.08). Those points where chosen 
after a previous exploration with phase I. 

We illustrate one run of our FREM algorithm in the left panel of Eigure 6.7 and 
Table 6.5. Eor that run, we obtained a cluster average of 0=(1.22,0.065). The EREM 
algorithm took p*=95 iterations to converge using a threshold of 1.4 for R. We take 
that cluster average as a MLE estimation of the unknown parameters. 


i □ = of) 

1 (0.5, 0.04) 

2 (0.5, 0.08) 

3 (1.5, 0.04) 

4 (1.5, 0.08) 


0 = 

(6.24e-01, 3.29e-02) 
(7.68e-01, 4.07e-02) 
(l.Ole-kOO, 5.25e-02) 
(1.53e-k00, 7.97e-02) 
Table 6.5 


O = dffj 

(1.24e-k00, 6.55e-02) 
(1.29e-k00, 6.67e-02) 
(1.18e-k00 6.27e-02) 
(1.20e-k00, 6.34e-02) 


Values computed by one run of the FREM Algorithm for the birth-death example corresponding 
to the left panel of Figure 6.7. 
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Fig. 6.6. Data trajectory for the Birth-death example. This is obtained by observing the values 
of an SSA path at uniform time intervals of size At=5. 




Fig. 6.7. Left: FREM estimation (phase I and phase II) for the birth-death example. The N 
final values of this particular run are shown as circles. Right: We show 30 independent runs of the 
FREM algorithm. 


We compute an ensemble of 30 independent runs (and obtained 30 cluster aver¬ 
ages), and the result is shown in the right panel of Figure 6.7. We observe a moderate 
variability in the estimates. This may indicate that the R threshold needs to be de¬ 
creased and consequently more iterations of the algorithm may be needed. Details 
are shown in Table 6.6. 



Average 

Average Cl at 95% 

Min Value 

Max Value 

Cl 

1.243 

(1.237, 1.249) 

1.213 

1.284 

C2 

0.0659 

(0.0655, 0.0663) 

0.0643 

0.0681 


Table 6.6 


Values computed for an ensemble of 30 independent runs of the FREM algorithm for the birth- 
death example. In each run, we obtain a cluster average, as an MLE point estimate. Define 

For each unknown coefficient cj in 0, we show i) the average ofC, ii) a 95% confidence 
interval for the mean of C, and Hi) the minimum and maximum values of C. 
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6.4. SIR Epidemic Model. In this section we consider the SIR epidemic 
model, where X{t) = {S{t), I(t), R{t)) (susceptible-infected-removed individuals) and 
the total population is constant, S'-|-/-l-i? = N (see [5]). The importance of this exam¬ 
ple lies in the fact that has a nonlinear propensity function and it has two dimensions. 

This model has two reaction channels 


S+1A2I, I^R 


described by the stoichiometric matrix and the propensity function 


u 


T 


-1 0 \ 

1 and a{X) 

0 1 j 


psi \ 
'yl J 


We set Xo=(300, 5), T=10 and consider synthetic data generated using the parameters 
0G=(1-66, 0.44) by observing X at uniform time intervals of size At=l. The data 
trajectory is shown in Figure 6.8. 
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Fig. 6.8. Data trajectory for the SIR example. This is obtained by observing the values of an 
SSA path at uniform time intervals of size At=l. 


Data trajectory 


o s 

X I 


For this example we ran iV=4 FREM sequences starting at = (0.40, 0.05), 

0j°2=(O-4O, 1.00), 0j° 3=(3.OO, 0.05), and 0j°4=(3.OO, 1.00). Those points where chosen 
after some previous exploration with phase I. 

We illustrate one run of the FREM algorithm in the left panel of Figure 6.9. Our 
MLE point estimation is obtained as the cluster average of the values shown in Table 
6.7, that is d=(1.65,0.39). The FREM algorithm took p*=3 iterations to converge, 
using 1.4 as a threshold for R. 

We computed an ensemble of 30 independent runs (and obtained 30 cluster av¬ 
erages); results are shown in the right panel of Figure 6.9. We observe a very small 
variability in our estimates; details are shown in Table 6.8. 
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Fig. 6.9. Left: FREM estimation (phase I and phase II) for the SIR example. The N final 
values of this particular run are shown as circles. In this particular case, where the results of phase 
I collapses to a single point, = 4 FREM sequences seem to be unnecessary, but we note that the R 
criterion needs at least 2 sequences. Right: We show 30 independent runs of the FREM algorithm. 


1 

T 

2 

3 

4 


□ = 0 = 

(0.40, 0.05) (1.50, 0.38) 

(0.40, 1.00) 4-50, 0.38) 

(3.00, 0.05) 4-50, 0.38) 

(3.00, 1.00) (1-50, 0.38) 

Table 6.7 


_ a ( P *) 

O - 

(1.65, 0.39) 
(1.65, 0.39) 
(1.66, 0.39) 
(1.66, 0.39) 


Values computed by one run of the FREM Algorithm for the SIR example corresponding to the 
left panel of Figure 6.9. 



Average 

Average Cl at 95% 

Min Value 

Max Value 

Cl 

1.6784 

(1.6764, 1.6804) 

1.6648 

1.6891 

Cl 

0.3942 

(0.3939, 0.3945) 

0.3920 

0.3956 


Table 6.8 


Values computed for an ensemble of 30 independent runs of the FREM algorithm for the 
SIR example. In each run, we obtain a cluster average, as an MLE point estimate. De¬ 
fine . For each unknown coefficient Cj in 6, we show i) the average of C, ii) a 95% 

confidence interval for the mean of C, and Hi) the minimum and maximum values of C. 


6.5. Auto-Regulatory Gene Network. The following model, taken from [6], 
has eight reaction channels and five species, 


DNA + Pa 

Cl 

DNA-P 2 , 

DNA-P 2 

Co 

DNA + P 2 

DNA 

C3 

DNA + mRN A, 

mRN A 

C4^ 

0 

P + P 

C5 

P 2 , 

P 2 

cg 

P + P 

mRN A 

C7 

mRN A + P, 

P 

cr 

0 
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is described respectively by the stoichiometric matrix and the propensity function 

C 1 DNA-P 2 \ 

C 2 DNA ■ P 2 
C 3 DNA 
C 4 mRN A 
C3P{P-l) 
ceP 2 

C 7 mRN A 
csP 

Quoting [G], “DNA, P, P 2 , and mRN A represent DNA promoters, protein gene 
products, protein dimers, and messenger RNA molecules, respectively.” This model 
has been selected to test the robustness of our FREM algorithm to deal with several 
dimensions and several reactions. Following cited works, we also set the initial state 
of the system at 

Xo = {DNA, DNA-P 2 , mRN A, P, P 2 ) = (7,3,10,10,10), 

and run the system to the final time T = 50. Synthetic data is gathered by observing 
a single trajectory generated using Oq = (0.1,0.7, 0.35,0.3, 0.1,0.9, 0.2,0.1) at uniform 
time intervals of size At=i. The data trajectory is shown in Figure 6.10. For this 
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Fig. 6.10. Data trajectory for the auto-regulatory gene network example obtained by observing 
the values of an SSA path at uniform time intervals of size At= ^. 

example we ran N=2 FREM sequences starting at ofl = OAv and ofl = 0.5 v, 
respectively, where v is the vector of K® with all its components equal to one. 

The FREM algorithm took, on average, p*=169 iterations to converge, taking 2 
days in our workstation configuration: a 12 core Intel GLNXA64 architecture and 
MATLAB version R2014a. 

We computed an ensemble of 10 independent runs and obtained 10 cluster aver¬ 
ages. We observe very small variability. Details are shown in Table 6.9. 

Remark 6.4. Observe that in the examples where the stoichiometric vectors are 
linearly dependent, the results of the phase I, i = 1,2, 3,4, lies in a hyperplane 
that reflects a certain amount of indifference in the coefficient estimations. This does 
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Average 

Average Cl at 95% 

Min Value 

Max Value 

Cl 

0.1011 

(0.1001, 0.1021) 

0.0984 

0.1033 

C2 

0.6207 

(0.6135, 0.6279) 

0.6005 

0.6328 

C3 

0.3398 

(0.3380, 0.3416) 

0.3358 

0.3441 

C4 

0.3182 

(0.3166, 0.3198) 

0.3139 

0.3213 

C5 

0.0637 

(0.0622, 0.0652) 

0.0595 

0.0687 

C6 

0.5891 

(0.5742, 0.6040) 

0.5485 

0.6357 

C7 

0.1444 

(0.1426, 0.1462) 

0.1392 

0.1483 

Cg 

0.0630 

(0.0623, 0.0637) 

Table 6.9 

0.0618 

0.0652 


Values computed for an ensemble of 10 independent runs of the FREM algorithm for the auto- 
regulatory gene network example. In each run, we obtain a cluster average, §^'‘\ as an MLE point 
estimate. Define . For each unknown coefficient Cj in 9, we show i) the average of C, 

ii) a 95% confidence interval for the mean of C, and Hi) the minimum and maximum values ofC. 


not happen in the SIR example where all the estimations in phase I are essentially the 
same. 

7. Conclusions. In this work, we addressed the problem of efficiently computing 
approximations of expectations of functionals of bridges in the context of stochastic 
reaction networks by extending the forward-reverse technique developed by Bayer 
and Schoenmakers in [3]. We also showed how to apply this technique to the statis¬ 
tical problem of inferring the set of coefficients of the propensity functions. We pre¬ 
sented a two-phase approach, namely the Forward-Reverse Expectation-Maximization 
(FREM) algorithm, in which the first phase, based on reaction-rate ODEs is determin¬ 
istic and is intended to provide a starting point that reduces the computational work 
of the second phase, namely, the Monte Carlo EM Algorithm. Our novel algorithm 
for generating bridges provides a clear advantage over shooting methods and meth¬ 
ods based on acceptance rejection techniques. Our work is illustrated with numerical 
examples. In the future, we plan to incorporate higher-order kernels and multilevel 
Monte Carlo methods in the FREM algorithm. 
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Appendix A. Algorithms. 


Algorithm 1 The F-R (forward-reverse) path generation algorithm in phase II, for 
a given time interval, [s,t]. Inputs: the initial sample size, Mq, the coefficient of 
variation threshold, cvq, the initial time, s, the final time, t, the initial observed state, 
a:(s), and the final observed state, x{t). Outputs: a sequence of the number of times 
that a reaction channel fired in the given time interval, a sequence of 

forward Euler values for the given time interval, iiuj,i)j=i)iLi a sequence of kernel 
weights for the given time interval, ((rcj,i)/=i)/hi- Notes: Here Vd is the volume of 
a d dimensional unit sphere, A.; is the sampled forward process value at time tn , 
is the sampled reverse process at time ks is the Kronecker delta kernel 
and Ke is the Epanechnikov kernel, L is the number of joined F-R paths in the time 
interval [s,t], where 0 < L < Finally, 0 < 7 < 1 and Cl is an integer greater 
than 1 (in our examples we use 2 ). 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 


1 

M ^ Mo 
C i — ^ (t — s) 
while cv > cvq do 

for TO = M to M+M—1 do 


((X^.m.n,^m,n)”iT'^ ^ FW path from s to t* Starting at a;(s) 


d/) \Nim) / (/) 

Af) Af) (CU) 


^ V^m,n+1 ^m,n ,m,n ) 


^ RV path from t to t* starting at x{t) 




[t, 


(b) 


Ab) 


n' jyj 




(b) ^ 

if,b), 


end for 

^ join F-R paths (X^/’*'^(r), (rj^’*'^)/=i, Ka) 

Here, ajj = s.t. to € {1,2, and 

K 5 (X.^,m(t*),X.^,m(t*)) > 0. Similarly for r^^i. 

if L < [ 7 M] then 

E ^ covariance matrix of 

E ^ E -I- cdiag{Tt), where c is a positive constant, 
if E“^/^ not singular then 


H 


lE-l/2(M)l/d 


Vd' 


1 

repeat 

(M.,pr.,i,n;.,z)/fri ^join F-R paths 


aJA ')j=l,Ke) 


C^1.5C 

until L < ClM 
end if 
end if 

compute the coefficient of variation of (u.,i)z^i {r.j)jLi (see section 5.1) 
M ^ M + M 
M ^2M 

end while 
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Algorithm 2 The F-R path-join algorithm for phase 11. Inputs: a sequence of forward- 
backward samples for the time interval [s,t] evaluated at the intermediate time, t*, 
a sequence of the number of times that a reaction channel fired in the 
forward interval [s, t*] and in the reverse interval [t*, t], r^f.’^\ the sequence of forward 
Euler values for each reaction channel for the forward interval and for the 

backward interval [t*,i\, and the kernel k. Outputs: the number of joined 

paths, L, a sequence of the number of times that a reaction channel fired in the 
interval [s,t], the sequence of forward Euler values for each reaction 

channel for the interval [s,t], and the sequence of kernel weights for 

the interval [s,t], Notes: S' is a two dimensional sparse matrix of size 

Cx M. 

1 : L^O 

2 : for z = 1 to d do 

3: A, ^ min™[xf_{;f^(t*)J 

4: Ri ^ max™[x|{;'’^(t*)] 

5: Ei <— 1 + Bi — Ai 

6; end for 

7: for m = 1 to M do 

8: Pi ^ 1 -I- rx|'2(t*)] — Ai 

9: c ^ convert(p, E) (converts d dimensional address to {1,C}) 

10; Sc,„(c)+i ^ rn, where n(c) is the number of elements in row c of S 

11: n(c) ^ n(c) -I- 1 

12 ; end for 

13; for m = 1 to M do 

^ gat neighboring sub-boxes of s.t. bk G {1,..., C} 

15; for A: = 1 to 3"^ do 
16; for j = 1 to n{ck) do 
17; £ ^ Sc,,j 

18; 

19: if v > 0 then 

20 : L ■(— L \ 

21: Ui i — ym 

22 : ri ^ r^/'* + 

23: Wi ^ V 

24: end if 

25: end for 

26: end for 

27; end for 
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